Evaluation of Modeled Hyperspectral Infrared Spectra Against All‐Sky AIRS Observations Using Different Cloud Overlap Schemes

Abstract Hyperspectral infrared sounding contains information about clouds, which plays an important role in modulating Earth's climate. However, there is a great deal of uncertainty in modeling the radiative effect of clouds due to its complex dependence on various parameters. Therefore, cloudy scenarios are often neglected in retrievals of infrared spectral measurements and in data assimilation. One‐dimensional radiative transfer (RT) models have a limited capability to represent the cloud three‐dimensional multilayer structure. This issue is typically resolved by using a multiple independent column approach, which is computationally demanding. Therefore, it is necessary to find a balance between computational speed and accuracy for infrared RT all‐sky radiance simulations. In this study, we utilize the Community Radiative Transfer Model with four different cloud overlap schemes and compare against observations made by the Atmospheric Infrared Sounder (AIRS) using a statistical metric called the first Wasserstein distance. Our results show that the average cloud overlap scheme performs the best and successfully predicts the overall probability distribution of brightness temperature over nonfrozen oceans for a wide range of wavelengths. The mean absolute differences are less than 0.7 K for 846 selected AIRS channels between 790 cm−1 and 1231 cm−1.

• Hyperspectral simulations, for more than 80,000 scenarios, using different cloud overlap schemes are compared with Atmospheric Infrared Sounder (AIRS) observations • Average overlap scheme provides best results compared with AIRS observations • First Wasserstein distance is a more suitable metric than the commonly used Pearson correlation for comparing probability distributions Correspondence to: V. Natraj, vijay.natraj@jpl.nasa.gov

Citation:
Le, T., Natraj, V., Braverman, A. J., & Yung, Y. L. (2022). Evaluation of modeled hyperspectral infrared spectra against all-sky AIRS observations using different cloud overlap schemes. Earth and Space Science, 9, e2022EA002245. https://doi.org/10.1029/2022EA002245 cloud distribution is modeled by making assumptions about cloud overlap. Typical cloud overlap assumptions include maximum, random, and maximum-random overlap. Geer et al. (2009) developed an average overlap scheme for microwave RT. In all cases, radiances are computed for several atmospheric "columns" with the effective radiance calculated using a weighted average of the individual column radiances. The columns are constructed from the horizontal cloud fraction at each atmospheric level (which is assumed to be known). Different overlap assumptions then determine how the cloudy layers are stacked in the vertical. In the case of maximum overlap, all the cloudy layers are concentrated in the same columns as much as possible. For random overlap, the cloudy layers are distributed randomly across the columns. Maximum-random overlap assumes that the cloudy columns are maximally overlapped in adjacent vertical layers that are both cloudy but randomly distributed where there is a cloud-free layer in between. Average overlap assumes an average cloud fraction over the whole vertical profile with weight based on the total hydrometeor densities (e.g., liquid/ice cloud, rain, and snow). Hogan and Illingworth (2000) introduced an exponential-random overlap scheme, where the total cloud cover is given by a weighted average of the maximum-random and random overlap estimates of cloud cover. The weight is a function of the grid resolution in the general circulation model.
The drawback of the maximum and random overlap methods is that they are based on geometric assumptions that are too simplified to handle multilayered clouds (Tian & Curry, 1989). Maximum-random overlap, on the other hand, is more realistic; however, about 10-100 columns are required to accurately represent typical cloudy atmospheric scenarios (Chen et al., 2013). Therefore, using the maximum-random overlap assumption significantly increases the runtime of RT models, necessitating the use of fast and accurate cloud overlap methods. The hydrometeor-weighted average overlap approach has been shown to reduce errors by 40% in areas of clouds and precipitation (Geer et al., 2009) with the usage of just two columns. O'Dell et al. (2007) presented an alternative two and three independent column scheme for microwave RT, which they referred to as the "optimal" approach. The two-column optimal scheme is more accurate than approaches with equally weighted columns but incurs ∼75% more computational cost because it requires an extra RT calculation.
Recent studies have made great progress in the usage of all-sky microwave and infrared radiances for assimilation (Geer et al., , 2019. Geer et al. (2019) tested a multiple independent column approach under the maximum-random overlap assumption in the RT for TOVS (RTTOV) model. However, this approach is computationally demanding, taking about 34 times longer than clear sky RT in the European Center for Medium-Range Weather Forecasting (ECMWF) system (Geer et al., 2019). Therefore, we need to find a balance between computation burden and accuracy for all-sky infrared radiance assimilation. In this study, we evaluate the Community Radiative Transfer Model (CRTM) using four different cloud overlap schemes.
In order to evaluate the performance of CRTM for all-sky radiance simulations, we simulate tens of thousands of scenarios and compare against AIRS observations by looking at the probability distribution function of the difference between the surface temperature (ST) (obtained from European Centre for Medium-Range Weather Forecasting (ECMWF) model estimates) and the brightness temperature in several channels. This difference is a measure of the radiometric effect of clouds. Aumann et al. (2018) used Pearson correlation of brightness temperature histograms to evaluate RT model performance. This work differs from the Aumann et al. (2018) work in four important ways. First, the data used here have a much better spatiotemporal match with AIRS observations (see Section 2.2). Second, we perform simulations for about 30 times more scenarios. Third, we perform model-observation comparisons for 846 selected AIRS channels between 790 cm −1 and 1,231 cm −1 compared to the two channels used by Aumann et al. (2018). Fourth, we demonstrate that the Pearson correlation is not an ideal approach to compare probability distributions (see Section 2.4). Instead, we utilize a statistical metric, called the first Wasserstein distance, to quantitatively measure how far model probability distributions deviate from AIRS observations. Compared to the Pearson correlation technique, the first Wasserstein distance is less sensitive to the choice of histogram bins and provides a better characterization of the overall shape of the distribution.
The paper is organized as follows. In Section 2, we describe the data set and relevant models used in this work. We also introduce the first Wasserstein distance statistical metric that is used for the model-measurement intercomparison. Section 3 summarizes the simulation scenarios used in this study and discusses the evaluation of all-sky CRTM simulated radiances under different cloud overlap schemes against AIRS observations. The major findings are described in Section 4.

AIRS
AIRS is a grating array spectrometer covering the thermal infrared and shortwave infrared spectral range with 2,378 channels. The instrument spectral resolving power is = 1,200. The noise is typically smaller than 0.2 K. The nadir footprint of AIRS is 13.5 km from a 705 km orbit with scans of about ±49.5° from nadir (Aumann et al., 2003). AIRS observations in the infrared region enable atmospheric temperature and water vapor vertical profile retrieval.
We used AIRS observations between 31 October 2018 21:00 UTC and 01 November 2018 09:00 UTC. Land and frozen ocean scenarios are excluded to avoid the possibility of introducing errors due to biases in modeled surface albedos. Overall, 82,271 scenarios are used in this study (see Table 1 for details).

ECMWF
The model profiles are generated by the ECMWF operational global weather forecasting system (ECMWF, 2009). The best available estimate of the atmospheric state is taken from the short-range forecast. The ECMWF atmospheric states used by Aumann et al. (2018) were estimated using 3 hr and 0.25° horizontal resolution (∼25 km at the equator) time/space interpolation, which resulted in matchup errors for the comparison with AIRS observations. For this study, we employ the 15 min and 9 km time/space interpolation data used by the ECMWF integrated forecast system. We obtain atmospheric-state profiles at 137 vertical levels, including pressure, temperature, O 3 , H 2 O, liquid/ice cloud content, and cloud cover.

Community Radiative Transfer Model
CRTM is a fast RT model developed by the Joint Center for Satellite Data Assimilation in the United States (Han et al., 2006). CRTM simulates satellite infrared and microwave radiances with respect to atmospheric-state variables (e.g., temperature, pressure, humidity, water and ice cloud content, and trace gas concentrations). It consists of the following key modules: gaseous transmission, surface emission and reflection, cloud and aerosol absorption and scattering, and RT solver. The default RT algorithm used for scattering calculations is the Advanced Doubling-Adding (ADA) algorithm. CRTM also contains a k-matrix module for Jacobian calculations, a tangent-linear module, and an adjoint module, which are important for radiance assimilation and for the inversion part of retrievals (Garand et al., 2001;Geer et al., 2019;Li et al., 2022;Weng & Liu, 2003).
In this study, we use CRTM version 2.4.0. Its cloud module has six cloud types: water, ice, rain, snow, graupel, and hail, which are defined by the cloud particle densities. We only use water and ice clouds in our calculations. The cloud optical properties (i.e., mass extinction coefficient, single scattering albedo, and asymmetry parameter) data used in this study are taken from the default cloud property lookup table (CloudCoeff.bin, version 3.0.4). The optical properties of water clouds with density less than 0.9 g/cm 3 are generated based on Mie theory by using a modified gamma distribution (Han et al., 2006). The ice cloud optical properties are based on the MODIS collection 5 (MC5) ice habit model (Baum et al., 2005;Ding et al., 2011). This model chooses a habit mixture scheme (including droxtals, 3D bullet rosettes, hexagonal plates, hollow/solid columns, and aggregates) based on the particle diameter. For individual habits, the optical properties are computed at 65 wavelengths between 0.2 and 100 μm and for 45 size bins.
CRTM also requires the cloud particle effective radius vertical profile. To obtain the effective radius, we use the formulation in Bower et al. (1994) for water clouds and the temperature dependence obtained by Ou et al. (1995Ou et al. ( , 2013  Note. We denote |lat| ≤ 30, |lat| in (30,60], and |lat| > 60 as tropical, midlatitude, and polar zone, respectively. where is the clear sky top of the atmosphere (TOA) radiance, is the cloudy sky TOA radiance (where both water and ice cloud are included), and is the total cloud cover. Note that the values of and are both dependent on the selected cloud overlap scheme. For level i in a n-level atmosphere, we denote the cloud cover as and the total cloud content for all cloud types as . Then, tcc can be calculated as follows: The maximum-random overlap scheme in CRTM only uses two columns, possibly reducing its accuracy.
The calculated radiance spectra for the two columns are then combined using Equation 1 to obtain the simulated AIRS radiances using temperature, cloud, water, and ozone profile inputs from ECMWF. The CO 2 concentration is based on the U.S. 1976 Standard Atmosphere profile but is scaled to 405 ppmv (representative of global mean concentrations for the year 2018). Note that we ignore the seasonal cycle of CO 2 concentration although its amplitude can be as high as 12 ppmv at some latitudes.

The First Wasserstein Distance
In order to evaluate the performance of the model simulations, we quantitatively measure how far the modeled probability distributions deviate from AIRS observations. More specifically, we analyze probability distributions of the difference between the surface temperature and the channel brightness temperature. Aumann et al. (2018) used the Pearson correlation to compare the model and measurement probability distributions. However, while the Pearson correlation coefficient is a measure of the linear correlation between two variables, it is not suitable for comparing probability distributions. A high degree of correlation does not necessarily imply a high causal relationship. This is illustrated in Figure 1 for a sample AIRS observation. The black-dashed line is the probability distribution of the difference between the surface temperature and the brightness temperature in the 901 cm −1 channel for the nonfrozen ocean scenarios described in Table 1. We do a linear rescaling of the AIRS probability distribution to obtain five synthetic probability distributions (solid colored lines). All the probability distributions shown in Figure 1 have a Pearson correlation coefficient of 1.0 despite clear differences between the distributions and the actual AIRS observations. This degeneracy can be resolved by employing the Wasserstein distance, which is a measurement of the distance between two probability distributions.
The nth Wasserstein distance between two distributions and is defined as where Π( ) is the collection of all possible joint distributions between and x and y are a pair of random vectors marginally distributed as and . inf (infimum) represents the greatest lower bound. For each possible joint distribution ∈ Π( ) , we calculate the expected value of the distance for all pairs of samples from to . The lower bound is the nth Wasserstein distance.
The first Wasserstein distance is also known as the Earth Mover's Distance (EMD), which has important applications in the field of computer science (Arjovsky et al., 2017;Rubner et al., 2000). Intuitively, if one thinks of two probability distributions as two different ways of piling up a certain amount of dirt, then EMD is the minimum 10.1029/2022EA002245 5 of 14 cost of turning one pile into the other, where cost is assumed to be the product of the amount of dirt moved and the distance by which it is moved.
Assume that the lengths of two discrete distributions and are M and N, respectively. Further, and are values at locations and within those respective distributions. The first Wasserstein distance can then be calculated using the following formula: where is the distance between and , and is any possible way to move to . is the amount that should be moved from to .
In our study, we use the python package "SciPy" to calculate the first Wasserstein distance. Figure 1 shows the first Wasserstein distance between AIRS observations and synthetic probability distributions; the utility of the first Wasserstein distance as a metric is clearly evident. There are other statistical metrics for comparing probability distributions or histograms, such as Kullback-Leibler (KL) divergence and neighborhood statistics (Geer & Baordo, 2014;Roberts & Lean, 2008). While there are mathematical links between the first Wasserstein distance and the KL divergence (Amari et al., 2018), the advantage of the former is that it is symmetric ( ( ) = ( ) ). On the other hand, the neighborhood statistics method requires conversion of the raw data to binary fields based on some threshold filter, while the first Wasserstein distance is a nonparametric metric that does not lose information through the application of any filter. Furthermore, the first Wasserstein distance is a natural way to compare two probability distributions where one is derived from the other by making small perturbations.  Table 1 summarizes the scenarios used in this study. We denote |lat| ≤ 30, |lat| in (30,60), and |lat| > 60 as tropical, midlatitude, and polar zone, respectively. Daytime and nighttime cases are roughly balanced in the data set. Both the tropical and midlatitude zone scenarios account for ∼44% of the 82,271 cases. However, as we select only nonfrozen scenarios and because the majority of the high-latitude region is covered by ice, only ∼11% of the chosen scenarios are in the polar zone.

Results
In this study, we first choose three atmospheric window channels (901 cm −1 , 1231 cm −1 , and 2615 cm −1 ). The first two are in the thermal infrared spectral range, while the last one is in the shortwave infrared (where effects due to reflection and scattering of solar radiation need to be considered). Figure 2 shows probability distribution plots of the difference between surface temperature (ST) and channel brightness temperature (T B ) for AIRS observations (dashed black) and CRTM with average overlap (red), maximum-random overlap (blue), random overlap (green), and maximum overlap (orange) schemes in these three channels for nonfrozen ocean day and night cases. On each curve, there are 60 points from −30 to 90 K with a 2 K temperature interval. The peak of the probability distribution in the infrared channels (901 cm −1 and 1,231 cm −1 ) is near 5 K, which indicates relatively little cloudiness or a clear-sky scenario. We only observe daytime cases in the 2,615 cm −1 channel with negative (ST-T B ) values, showing surface reflection effects or strong forward scattering from the clouds. CRTM with average overlap not only successfully simulates the peak of the probability distribution for all scenarios but also reproduces the overall shape of the observed distribution. In contrast, the other overlap options produce worse results, especially with respect to the peak, with maximum overlap performing the best and random overlap the worst. The random overlap scheme places all vertical subgrid clouds in a random fashion, usually resulting in a higher tcc compared to the other cloud overlap schemes. Previous studies have also found that the random overlap scheme performs poorly for clouds in the lower troposphere (see, e.g., Hogan and Illingworth (2000)). Overall, random overlap, maximum-random overlap, and maximum overlap overestimate the total cloud fraction, resulting in a smaller brightness temperature compared with observations. Table 2 summarizes the first Wasserstein distances as well as the Pearson correlation coefficients of the simulated radiances in Figure 2 against the observations. The models with best performance for each scenario in Table 2  Note. The best performing models for each channel and metric are marked in red.

Table 2 Comparison of Community Radiative Transfer Model Performance With Different Overlap Schemes for All (Day and Night), Day Only, and Night Only Scenarios in Three Window Channels Using Two Metrics: Pearson Correlation Coefficient and First Wasserstein Distance
marked in red. Note that a smaller first Wasserstein distance or a larger Pearson correlation coefficient indicates better agreement with observations. In almost every scenario, CRTM with the average overlap provides the best results based on both the first Wasserstein distance and the Pearson correlation coefficient. The only exception is daytime at 2,615 cm −1 . For this scenario, the Pearson correlation coefficient suggests that the maximum overlap scheme is the best, while the first Wasserstein distance indicates that the average overlap scheme matches the observations better than the other options. From the daytime all zone 2,615 cm −1 subplot in Figure 2, it is evident that the average overlap results (red) provide the best match with the AIRS observations. The maximum overlap simulations produce many more cases with (ST-T B ) between −20 K and −5K, and fewer cases for the peak near 0 K. This suggests that the maximum overlap scheme underestimates low cloud effects. This special case indicates that the first Wasserstein distance is a better metric than the Pearson correlation coefficient for comparing two probability distributions. , and polar (| | > 60 ) zones, respectively. Overall, the average overlap scheme still gives the best results among the four options, reproducing both the peak and the shape of the probability distributions with high fidelity. The model performance is relatively worse in the polar zone compared to the other two zones ( Figure 5). Although all four cloud overlap schemes are able to simulate the two peaks for the daytime scenario in the polar zone, the secondary peak for this case has a negative bias (of nearly 5 K compared to AIRS observations). The models also fail to match the nighttime scenario peak for the polar zone in all three window channels. The simulations are colder than the actual AIRS observations, indicating that the cloud optical depth is overestimated and/or the clouds are placed too high in the simulations. In addition to the three window channels, we also evaluate the simulated radiances using a wide range of channels that cover CO 2 , H 2 O, and O 3 absorption regions. In this analysis, the mean T B , mean absolute T B , and the first Wasserstein distance with respect to AIRS observations are compared. Results are shown in Figure 6. Again, average overlap simulations show very good agreement with observations (T B discrepancies are less than 0.7 K) for all 846 channels, including the O 3 absorption band around 1,040 cm −1 . Figure 7 illustrates the pairwise comparison of observed AIRS 901 cm −1 T B with CRTM average overlap simulations on a global map. In general, the simulated radiances agree well with AIRS observations on a global scale; the absolute difference is less than 2 K in most cases. However, a clear regional pattern is evident, especially for the Intertropical Convergence Zone (ITCZ) where the absolute difference can be greater than 30 K.

Conclusions and Discussion
In this study, we employ CRTM with four different cloud overlap schemes to calculate the TOA radiation field in three atmospheric window channels and several hundred absorbing channels (covering the thermal and shortwave infrared spectral regions) for a large number of cloudy atmospheric scenarios and compare the results against AIRS observations using the first Wasserstein distance as a statistical metric. We use atmospheric-state profiles from ECMWF at 137 vertical levels as the inputs for CRTM. Our results show that the average overlap scheme successfully predicts the overall probability distribution of clouds over a wide range of spectral channels between 790 cm −1 and 1,231 cm −1 for over 80,000 scenarios.
CRTM simulations have zone-dependent biases, especially for the polar region. In particular, it has difficulty reproducing the secondary peak of the probability distribution function for the 2,615 cm −1 channel. Moreover, a pairwise comparison between CRTM average overlap simulations and AIRS radiances shows a regional pattern. There are three possible error sources in our simulations: (a) random and systematic error from ECMWF cloud profiles (e.g., water/ice cloud content and cloud cover), (b) error due to oversimplified assumptions about cloud overlap, and (c) error from cloud optical property coefficients. Our approach of comparing the overall probability distribution functions between model and observation can mostly cancel out the random errors in the ECMWF profiles but will not eliminate systematic biases.
Further investigations are required to evaluate the best cloud overlap assumption for different scenarios and spatial scales. The first Wasserstein distance is a superior statistical metric (compared to traditionally used metrics such as the Pearson correlation coefficient) to evaluate the fidelity of RT models with respect to observations and should be used in future RT model intercomparisons. Another approach could be to use a cloud resolving model Figure 6. (a) Mean brightness temperature for, (b) mean absolute brightness temperature difference between, and (c) first Wasserstein distance between CRTM radiance simulations and AIRS observations for 846 AIRS channels between 790 cm −1 and 1231 cm −1 . Channels with noise >1 K have been removed before performing these calculations. The color scheme is the same as in Figure 2.
(CRM) to create a large number of scenarios with different vertical cloud distributions with a statistical analysis, which then performed to compare against simulations using the multiple independent column approximation (ICA) simulations that employ the CRM cloud fields as the "truth." ICA provides the best physical reference as long as sufficient independent columns are used. It is also critical to minimize the spatiotemporal mismatch between ECMWF and AIRS locations. Clouds vary a lot in space and time and it is very easy for biases to creep in simply because of errors in characterizing a scene as clear when it is cloudy and vice versa. Comparisons with ICA simulations eliminate the issue of model errors and allow usage of more conventional statistical approaches to evaluate the performance of the two-column overlap schemes for different scenarios.
The importance of such studies is clearly stated in the recently released Earth Science Decadal Survey (National Academies of Sciences, Engineering, and Medicine, 2018), which recommends a set of observation capabilities that will enable substantial progress in (a) providing critical information on the makeup and distribution of clouds and (b) addressing key questions about how changing cloud cover and precipitation will affect climate, weather, and Earth's energy balance in the future.